A Learning Technique to Determine Criteria for Multiple Document Summarization

نویسندگان

  • Fatma Kallel Jaoua
  • Maher Jaoua
  • Lamia Hadrich Belguith
  • Abdelmajid Ben Hamadou
چکیده

In this paper we describe a new method of automatic summarization based on a learning step to identify criteria that maximize the correlation between human summary and peer extract. The proposed method uses a genetic algorithm to produce extracts from a collection of source documents describing the same event. Theses extracts are compared to human summaries using “Rouge measure” in order to identify the correlation between statistical and linguistic criteria and “Rouge score”. The experiment Results are presented for a document set extracted from the DUC’06 evaluation conference.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

A Hybrid Approach for Extractive Document Summarization Using Machine Learning and Clustering Technique

Usually, presence of the same information in multiple documents is the main problem faced in effective information access. Instead of this redundant information thus accessed or retrieved, users are interested in retrieving information that addresses one or other several aspects. In such situation, text summarization proves to be very useful. Not only in Information retrieval, but it is an extr...

متن کامل

A Bottom-Up Approach to Sentence Ordering for Multi-Document Summarization

Ordering information is a difficult but important task for applications generating natural-language text. We present a bottom-up approach to arranging sentences extracted for multi-document summarization. To capture the association and order of two textual segments (eg, sentences), we define four criteria, chronology, topical-closeness, precedence, and succession. These criteria are integrated ...

متن کامل

Feature expansion for query-focused supervised sentence ranking

We present a supervised sentence ranking approach for use in extractive summarization. Using a general machine learning technique provides great flexibility for incorporating varied new features, which we demonstrate. The system proves quite effective at query-focused multi-document summarization, both for single summaries and for series of update summaries.

متن کامل

A Cluster Based Keyword Filtration Approach for Web Document Summarization

Summarization, an extremely important technique in Data Mining is an automatic learning technique aimed to extract the most valuable information from a large size document or the articles. The goal is to create the summary of the document, but substantially different from each other. Text Document summarization refers to the summarization of text documents based upon their content. The proposed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008